OPTIMIZED FFT/IFFT MODULE 



Field of Invention 

[0001] This invention relates to OFDM systems and, more particularly, to 
an optimized hardware implementation of the FFT/IFFT module which minimizes 
the number of clock cycles for computing the FFT/IFFT of a signal. 

Background of the Invention 

[0002] Wireless LAN (WLAN) technology is one of the most widely 
deployed and most rapidly expanding areas of radio communications. As 
demand for mobile data grows, networks will have to offer more bandwidth to 
support both a larger numbers of users as well as higher data transfer rates for 
individual users. Satisfying these demands involves the deployment of newer air 
interface technologies such as 3G cellular and the IEEE 802.1 1a standard. 

[0003] The IEEE 802.1 1a standard is based on a multicarrier modulation 
scheme called orthogonal frequency domain multiplexing (OFDM) in the 5GHz 
band. In multicarrier modulation, data signals (bits) are modulated onto a 
number of carriers rather than on a single carrier as in traditional AM or FM 
systems. The result is an optimum usage of bandwidth. The basic principle of 
OFDM is to split a high rate data stream into a number of lower rate streams, 
which are then transmitted simultaneously over a number of sub-carriers 
(overlapping, orthogonal narrow band signals). The frequencies used in OFDM 
are orthogonal. Neighboring frequencies with overlapping spectrum can 
therefore be used. This results in a more efficient usage of bandwidth. OFDM is 
therefore able to provide higher data rates for the same bandwidth. It also offers 
several advantages over single carrier systems such as better multi-path effect 
immunity, simpler channel equalization and relaxed timing acquisition 



constraints. Accordingly, OFDM has become the modulation method of choice 
for many new systems. 

[0004] Each sub-carrier in OFDM has a fixed phase and amplitude for a 
certain time duration, during which a small portion of the information is carried. 
This unit of data is called a symbol and the time period during which the symbol 
is available is called the symbol duration. After that time period, the modulation 
is changed and the next symbol carries the next portion of information. A set of 
orthogonal sub-carriers together forms an OFDM symbol. To avoid inter symbol 
interference (ISI) due to multi-path propagation, successive OFDM symbols are 
separated by a guard band. This makes the OFDM system resistant to multi- 
path effects. Although OFDM has been in existence for a long time, recent 
developments in DSP and VLSI technologies have made it a feasible option. As 
a result, OFDM is fast gaining popularity in broadband standards and high-speed 
wireless LAN standards such as the IEEE 802.11a. 

[0005] In practice, the most efficient way to generate the sum of a large 
number of sub-carriers is by using the Inverse Fast Fourier Transform (IFFT). At 
the receiver side, a fast and efficient implementation of the well known discrete 
fourier transform (DFT) function called the Fast Fourier Transform (FFT) can be 
used to demodulate all the sub-carriers. All sub-carriers differ by an integer 
number of cycles within the FFT integration time, which ensures the orthogonality 
between different sub-carriers. 

[0006] Several choices are available for implementing an OFDM modem: 
digital signal processing (DSP) based implementation, DSP-based 
implementation with hardware accelerators or a complete ASIC implementation. 

[0007] High performance digital signal processors (DSPs) are widely 
available in the market today. The computation intensive and time critical 
functions that were traditionally implemented in hardware are nowadays being 
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implemented in software running on these processors. However, a DSP-based 
implementation of an OFDM modem has the disadvantage of not being very 
optimum in terms of chip area occupied and power consumption. 

[0008] To overcome limitations incurred with a DSP-based implementation 
while still retaining the flexibility of a software implementation, some blocks of an 
OFDM transceiver can be implemented in hardware. Alternatively, the entire 
functionality may be implemented in hardware. Advantages of this ASIC-based 
approach include lower gate count and hence, lower cost and lower power 
consumption. 

[0009] When general purpose DSP chips do not meet the required 
performance parameters of an application, an ASIC (application specific 
integrated circuit) DSP may be developed. When a particular algorithm has to be 
implemented, for example the FFT/IFFT algorithm, an application specific DSP 
chip is generated with an architectural structure dependent upon the algorithm's 
computational structure. Alternatively, the algorithm can be restructured to better 
fit an available target architecture (for example, that of a parallel computational 
arrangement). Most current implementations of the FFT/IFFT engine for an 
OFDM modem are done using a DSP chip with software and concentrate on 
minimizing calls to the multiplier block. 

[00010] However, it would be advantageous to implement an FFT/IFFT 
engine entirely in ASIC technology so that each of the functional blocks of the 
FFT/IFFT engine be mapped onto dedicated, parallel hardware resources 
thereby avoiding the difficult programming and optimization challenges of 
scheduling time-critical operations through a single DSP core. An optimized 
hardware implementation which minimizes the total run time while at the same 
time minimizing the number of complex multiplier is, therefore, sought. 



3 



SUMMARY OF THE INVENTION 



[00011] 



The present invention pertains to symbolic or mathematical 



manipulation of the FFT/IFFT formula in order to derive an optimal hardware 
implementation. The invention involves restructuring the FFT/IFFT formula to 
minimize the number of clock cycles required to compute the FFT/IFFT while at 
the same time minimizing the number of complex multipliers required. 



for performing an N-point FFT/IFFT operation is provided comprising an input 
module for receiving a plurality of inputs in parallel and for combining said inputs 
after applying a multiplication factor to each of said inputs, at least one 
multiplicand generator for providing multiplicands to said system, at least two 
multiplier modules for performing complex multiplications, at least one of said 
multiplier modules receiving an output of said input module, each of said 
multiplier modules receiving multiplicands from said at least one multiplicand 
generator, at least one of said multiplier modules receiving an output of another 
multiplier module, a map module for receiving outputs of all of said at least two 
multiplier modules, said map module selecting and applying a multiplication 
factor to each of said outputs of said at least two multiplier modules, said map 
module generating multiple outputs and an accumulation module for receiving 
and accumulating said multiple outputs of said map module. 

[00013] In accordance with an aspect of the present invention, an N-point 
FFT/IFFT operation, with N being the number of input samples, may be 

(N ) 

performed in N clock cycles using — + 1 complex multipliers. In accordance 



with a preferred aspect of the present invention, an N-point FFT/IFFT operation is 

( N \ 

performed in N clock cycles using — + 1 complex multipliers. Accordingly, in a 



preferred implementation of the present invention, an optimized hardware 



[00012] 



According to one embodiment of the present invention, a system 
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configuration comprising 3 complex multipliers is used to compute a 64-point 
FFT/IFFT operation in 64 clock cycles. Advantageously, the total number of 
clock cycles required to complete the FFT/IFFT operation is minimized while at 
the same time minimizing the number of complex multipliers needed. 

[00014] The advantage of implementing an FFT/IFFT engine with ASIC 
technology is that each of the functional blocks of the FFT/IFFT engine be 
mapped onto dedicated, parallel hardware resources thereby avoiding the 
difficult programming and optimization challenges of scheduling time-critical 
operations through a single DSP core. 

[0001 5] Other aspects and features of the present invention will become 
apparent to those ordinarily skilled in the art upon review of the following 
description of specific embodiments of the invention in conjunction with the 
accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0001 6] A better understanding of the invention will be obtained by 
considering the detailed description below, with reference to the following 
drawings in which: 

FIG. 1 depicts a brute force hardware implementation for the FFT/IFFT operation; 
FIG. 2 depicts a partially optimized hardware implementation for the FFT/IFFT 
operation; 

FIG. 3 depicts a fully optimized hardware implementation for the FFT/IFFT 
operation according to the present invention; 

FIG. 4 depicts an example of the logic flow undertaken by the MAP module of 
FIG. 3 in accordance with one aspect of the present invention; and 
FIG. 5 depicts the general operation of the accumulation module of FIG. 3 in 
accordance with one aspect of the present invention. 



5 



DESCRIPTION OF THE PREFERRED EMBODIMENT 



[0001 7] The basic principle of OFDM is to split a high rate data stream into 
a number of lower rate streams each of which are transmitted simultaneously 
over a number of sub-carriers. In the IEEE 802.1 1a standard OFDM modulation 
scheme, the binary serial signal is divided into groups (symbols) of one, two, four 
or six bits, depending on the data rate chosen, and the symbols are converted 
into complex numbers representing applicable constellation points. Each 
symbol, having a duration of 4 microseconds, is assigned to a particular sub- 
carrier. An Inverse Fast Fourier Transform (IFFT) combines the sub-carriers to 
form a composite time-domain signal for transmission. The IEEE 802.1 1a 
standard system uses 52 sub-carriers that are modulated using binary or 
quadrature phase shift keying (BPSK/QPSK), 16 Quadrature Amplitude 
Modulation (QAM) or 64 QAM. On the receiver side, the Fast Fourier Transform 
(FFT) can be used to demodulate all sub-carriers. All sub-carriers differ by an 
integer number of cycles within the FFT integration time, and this ensures the 
orthogonality between the different sub-carriers. 

[00018] The heart of an OFDM baseband processor is, therefore, the 
FFT/IFFT engine. It is well known that the FFT operation is designed to perform 
complex multiplications and additions, even though the input data may be real 
valued. The reason for this situation is that the phase factors are complex and, 
hence, after the first stage of the operation all variables are complex-valued. 
Thus, in terms of a hardware implementation, the FFT operation can be 
implemented using summation modules and multiplication modules (multipliers). 

[0001 9] Multiplication modules are the most widely used circuit in an OFDM 
modem. However, multipliers are costly resources both in terms of chip area and 
power consumption. A greater number of multipliers will require greater chip 
area resulting in bulkier devices not suitable for mobile applications. However, 
the total time it takes for an FFT/IFFT engine to operate on a given set of input 
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samples (i.e. the total run time) is also critical as the less number of clock cycles 
it takes, the less the power consumption. With regard to the F FT/IF FT engine, it 
would therefore be desirable to reduce the number of multiplier modules required 
while at the same time minimizing the number of clock cycles required to 
compute the FFT/IFFT. 

[00020] The present invention pertains to symbolic or mathematical 
manipulation of the FFT formula in order to derive an optimal hardware 
implementation. The invention involves restructuring the FFT formula to 
minimize the number of clock cycles required while at the same time minimizing 
the number of complex multiplier modules. Since both the FFT and I FFT 
operations involve the same type of computations, only a discussion on the IFFT 
is presented. Those skilled in the art will appreciate that the formulation 
presented applies equally to an efficient implementation of the FFT operation. 

[00021] The computational problem for the IFFT is to compute the 
sequence Y{ri) of N complex-valued numbers given another sequence of data 

X(k) according to the formula 

1 N-l jlkntt 

Y(n) = —Yx(k)e N 0<n<N-\ equation(1) 

In the above formulation, one can see that for each sample n, direct computation 
of Y(n) involves N complex multiplications (4N real multiplications). 
Consequently, to compute the IFFT of all N samples, the IFFT requires N 2 
complex multiplications. 

[00022] FIG. 1 depicts one possible hardware implementation of equation 
(1). In FIG.1, N complex-valued numbers defining the input sequence X(k)are 
fed into a multiplexer (MUX) 120. The MUX 120 selects one of the N complex- 
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valued inputs and delivers it to a complex multiplier 140. The complex multiplier 
140 is adapted to access to a look-up table (LUT) 150 which contains the values 

e N forsome value N, 0< k< N- 1 and 0< n< N- 1. The output of the 
complex multiplier 140 is fed to an accumulation module 180 which may 
comprise a register 160. Using a single complex multiplier 140 as in FIG.1, it is 
readily seen that the computation of each output sample requires N complex 
multiplications and, hence, the use of the complex multiplier 140 N times. In 
other words, to compute each output time sample, the results of N complex 
multiplications are added and accumulated in the register. This process will have 
to repeat itself for each of the N input samples to derive the N output time 
samples. Since N output samples in total need to be computed, this results in a 
total runtime of N 2 clock cycles (assuming one complex multiplication per clock 
cycle) to compute the IFFT for the entire input sequence Y(n) . 

[00023] However, computation of the IFFT using the brute force hardware 
implementation of FIG. 1 is inefficient primarily because it does not exploit the 
symmetry and periodicity properties of the phase factor, e je , in equation (1). The 
present invention exploits these properties to minimize the total run time (number 
of clock cycles) for computing the IFFT/FFT of a given set of sample data. 

[00024] Those skilled in the art will appreciate that Equation (1 ) may be 
rewritten as the expansion 



equation (1a) 

or 
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Y{n)=e N Ix[-k + 0 

^(f.U f N N 



e~ + e~ Y X\^-k+l)e 2 + . 

*=0 ^ * ' 



jknn 

We 2 



equation (1 b) 



Simplifying the above yields, 



Y{n)=Y J e N Yx\-k + e \e 2 

£=0 *=0 V H 7 



equation (2) 



If we let 



jltnn 

P ( {n)=e~ ir 



equation (3) 



and 



jknn 



equation (4) 



the set of output samples may be rewritten as 



Y(n) = £ P t (n)G t (n) 



t=0 



equation (5) 



Letting 



R e (n)=P e (n)G e (n) 



equation (6) 



equation (5) may be rewritten as 



r(«)=M(.) 



equation (7) 
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[00025] FIG. 2 depicts a hardware implementation for equation (5) above. 
Incoming complex numbers 216 arrive in groups of four at input ports (K0, K1 , K2 
and K3) of a G t (n) module 220. Assuming N input samples, the four incoming 

N 

complex samples of each group will have Indices — apart. For example, the first 

group of incoming samples would be X(0), X(0+N/4), X(0+2N/4) and X(0+3N/4). 
Similarly, the second group of incoming samples would be X(1), X(1+N/4), 
X(1+2N/4) and X(1+3N/4). The output of the G t (n) module 220 is delivered to a 
complex multiplier 240 which is adapted to access a look-up table (LUT) 230 
containing complex-valued constants P e (n) as defined by equation (3). The 
output R t (n) of the complex multiplier 240 is the product G e (n) with P e (n) as 
defined by equation (6). This product is sent to the accumulation module 250 
which may comprise a register 260 as shown. 

[00026] Examining equation (4), it may be shown that the function of the 
G t (n) module 220 is to simply take the four incoming complex numbers 216 (with 

jknn 

indices N/4 apart), multiply each one by a constant e~and add them all. It may 

jknn 

be shown that the value of the constant e~ in equation (4) reduces to +1,-1, +j 
or-j depending on the values of k and sample number n . Therefore, no 
complex multiplications are conducted in this module. 

[00027] Considering the implementation in FIG. 2 and keeping equation (5) 

N 

in mind, those skilled in the art will appreciate that — complex multiplications of 

P t (n) x G e (n) are required for the computation of each output sample. The 
results of these multiplications may then be added together in the accumulation 
module 250 to obtain each output sample. Therefore, a single output is 

N 

generated every — clock cycles. Since N outputs need to be computed, the 



10 



total run time required to compute the FFT/IFFT for a set of N input samples 
using the implementation in FIG. 2 with one complex multiplier has been reduced 

N 2 

from N 2 clock cycles to — clock cycles. Although the reduction in total run time 

N 2 

from AT 2 clock cycles in FIG. 1 to — clock cycles in FIG. 2 is an improvement, 

further optimization may be made by exploiting the periodicity of the phase factor, 
e je , in functions P t (n) and G t (n). 

For example, substituting (n + 4) for n in equation (4) for G e (n) yields, 

G ( (n+ 4) = t i]e Mn ^ = e*»t [^^e^- equation (8) 

or 

G £ (n + 4) = e jk2n G,{n) = G,(n) equation (9) 

Similarly, substituting (n+ 4) for n in equation (3) for P e (n) yields, 

j2i(n+4)n jltnn+j%tn jlinn j%in 

P e {n+4)=e N =e N =e N -e N equation (10) 

or 

jUn 

P e (n + 4) = e N P t (n) equation (11) 

Substituting equation (1 1) and equation (9) into equation (6), it may be shown 
that 
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*,(n+4)= />(»+4)G € (n+4)= e * />(n)G € (n) = e " 



equation (12) 



[00028] The relationship defined by equation (12) states that, for a given 
value of l , the function R £ (n) for any given output sample is a phase rotation of 

the computed functions' value four output samples before. Previously, each 

N 

output sample required —computations of R £ (n) which were then summed to 

arrive at a given output sample. With R e (n) displaying the recursive relationship 

N 

defined by equation (12), each output sample still requires —computations of 

R e (n) . However, once the first four output samples i.e. Y(0), Y(1 ), Y(2), (3) are 

computed in N clock cycles (i.e. N computations), the values of R e (n) required to 

compute all other output samples are simply phase rotations of the previously 
calculated R e (ri) values. In other words, the number of clock cycles required for 

the entire FFT/IFFT operation is reduced to N. 

[00029] In terms of simplifying a hardware implementation, a variable /? can 
be defined with /? being a multiple of 4. Then, the following relation can be 
shown to hold 



R e (n+p)=e N R e (n) 



equation (13) 



Accordingly, equation (7) may be rewritten as 




equation (14) 



or 
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^-1 

1 4 )2El 



equation (15) 



[00030] Those skilled in the art will appreciate that only the first — products 

16 

in equation (15) require complex multiplications to be performed. For all other 
values of i , the product in equation (15) can be found by a trivial multiplication of 

N ( N ) 

one of the first — products. Accordingly, — + 1 complex multipliers are now 



required to perform the FFT/IFFT operation in N clock cycles. Although the 
number of clock cycles to perform the FFT/IFFT operation has been reduced by 

N 

an order of magnitude from N 2 to N , this has been at the expense of adding — 

more complex multipliers. However, the number of complex multipliers required 
may be further reduced using a very useful property as described below. 

[00031] In the general case, let us define a first complex number A = x+ jy 
with real part x and imaginary part y and a second complex number B = y + jx 
where B is the reflection of A about the 45 degree line in the unit circle. For 
complex numbers A and B each multiplied by a third complex number 
Z = R + jM , the following products are obtained: 



Examining equations (16) and (17) it is observed that all inner products (real 
multiplications) for both complex multiplications may be obtained by only carrying 
out one of the original complex multiplications. In other words, by computing A x 
Z, no new multiplications are required to compute B x Z. Computing B x Z is 




Ax Z= (xR- My)+ j(Ry+ Mx) 
BxZ=(Ry- Mx)+ j(My+ Rx) 



equation (16) 



equation (17) 
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simply a matter of rearranging the way the different inner products from A x Z are 
added or subtracted. This useful property, called Image Multiplication (since B is 
a mirror image of A about the 45-degree line in the unit circle), may be exploited 
to halve the number of complex multipliers determined previously. 

jiptx 

[00032] Specifically, to compute all possible products R e (n)e N in 

N 

equation (15), only — multipliers are required. Since one additional multiplier is 
required to compute R e (n) itself, the total number of complex multipliers required 

IS 1 32 + 7 ' Tnerefore - in accordance with an aspect of the present invention, 

the total number of clock cycles required for computing an TV -point FFT/IFFT can 

( N \ 

reduced by an order of magnitude from N 2 to N by using only — + 1 complex 



V32 



multipliers. 



[00033] According to an embodiment of the present invention, it is assumed 
that a 64-point FFT/IFFT operation is required i.e. N=64. In this case, equation 
(15) reduces to 



/ x l Ml 
Y \n+P) = -77L R e(")e 32 equation (18) 



where p is a multiple of 4 and 0</3< 60. Examining equation (18) above and 
noting that p is a multiple of 4, it is clear that in order to compute the IFFT/FFT, 

j*x jix j\2x 

the multiplication of R e (n) with only three complex numbers e 32 ,e 32 and e 32 is 

required. All other multiplications simply entail multiplying one of these products 
by 1,-1, j or -j. 

[00034] FIG. 3 depicts an optimized hardware implementation 300 of the 
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64- point IFFT/FFT operation defined by equation (18) in accordance with one 
aspect of the present invention. A first G module 310 having four input ports K0, 
K1, K2, and K3 receives four complex-valued input samples and is adapted to 
access a first look up table (LUT) 316. An output G e (n)of the G module 310 is 

delivered to a first complex multiplier module GX 320 which is adapted to access 
a second look-up table (LUT) 326. The output R e (n) of the first complex 

multiplier module GX 320 is delivered to a MAP module 360. An output R e (n) of 

the first complex multiplier module GX 320 is further routed to a second complex 
multiplier module GX1 330 and to a third complex multiplier module GX2 340. 
The second multiplier module GX1 330 is adapted to access a first storage unit 
336 containing a predefined complex-valued constant and delivers its output to 
the MAP module 360. Similarly, the third multiplier module GX2 340 is adapted 
to access a second storage unit 346 containing a predefined complex-valued 
constant and delivers two outputs to the MAP module 360. The MAP module 
360 generates a set of sixteen outputs 370 which are subsequently delivered to 
an accumulation module 380. The accumulation module 380 generates a set of 
sixteen outputs 390 corresponding to sixteen output time/frequency samples. In 
the implementation of FIG. 3, therefore, sixteen output samples are generated at 
any given time from sixty-four input samples. 

[00035] The G module 310 is the first module to receive incoming complex 
numbers. As in FIG.2, the G module 310 has four input ports (K0, K1, K2 and 
K3) and simply takes four incoming complex-valued samples 302 with indices 
being sixteen (N/4) apart, multiplies each one by a constant (+j, -j, +1 or -1) and 
adds them to form the output G e (n) . 

[00036] Specifically, four new complex numbers get latched into this 
module during each clock cycle. In order to load all 64 input samples for a 64- 
point IFFT/FFT computation, sixteen clock cycles are required. Since sixteen 
output samples 390 are generated at the output of the entire IFFT/FFT block, the 
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entire process is repeated 4 times to result in output time samples. A counter 
state n can thus be defined where n = 0, 1 , 2, 3 corresponding to the computation 
of each set of sixteen output samples. 

[00037] Once four complex numbers are loaded, the G module 310 
accesses the look up table (LUT) 316 to obtain appropriate multiplication factors 
for each complex number. The multiplication factor for each complex number 
may take on one of four possible values: +1 , -1 , +j or -j. For each complex 
number, the multiplication is performed on both real and imaginary parts. The 
results are then added to generate the output G e (n) which is pushed to the 

output port. This process must be repeated sixteen times {£ ranging from 0 to 
15) in order to generate the sixteen G t (n) values necessary for each output 

sample. 

[00038] In one implementation, the look-up table (LUT) 31 6 accessed by 
the G module 310 can have sixteen entrees. Two stimulus variables, namely 
the port number (0,1 , 2 or 3) and counter state n may then be used to define the 
value of the multiplication factor. A single local controller (not shown) may be 
used to select one set of multiplication factors from the LUT 316 and 
subsequently push them to the G module 310. Since the multiplication factors 
selected from the LUT 316 are determined by the two stimulus variables, the LUT 
316 may take the form of a truth table. 

[00039] The output G e (n) of the G module 310 is delivered to the GX 
Module 320. The GX module 320 is a complex multiplier used to generate 
R t (ri) from G ( {ri) . The output of this module may be described by the complex 

product . 
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where two global input variables are defined as before with n ranging from 0 to 3 
and Pranging from 0 to 15. The GX module 320 performs a complex number 
multiplication of its received input G e (n) by a complex-valued constant, 

jinn 

P ( (n) = e 1 * , where values for i>(n)are stored in the corresponding look-up table 
(LUT) 326. 

[00040] In a specific implementation, the LUT 326 may comprise eight 

predefined values i.e. e 32 ,e 32 — e 32 hard coded into the LUT block. Generating 
these eight constants is sufficient since all other constants can be easily derived 
based on these constants and the application of an appropriate multiplication 
factor. Based on the value of the product I x n , one of the eight values is 
selected. The next step is to determine the multiplication factor which can be 
one of four possible numbers : +1 , -1 , +j, or -j. In this manner, any constant 

jinx 

P t {n) = may be derived by performing a simple multiplication of a selected 
one of the eight values in the LUT 326 by an appropriate multiplication factor. 
The output R t (n) of the GX module 320 is subsequently delivered to the MAP 
module 360. The output R e (n) is also routed to the GX1 module 330 and to the 
GX2 module 340. 

[00041] The GX1 module 330 is a complex multiplier used to perform the 
complex multiplication of its received input R e (n) by a fixed complex-valued 

constant, e 32 . Mathematically, the output of the GX1 module 330 may be 
described by the following product: 




output from GX module 
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The GX1 module is adapted to access the storage unit 336 to obtain the 

complex-valued constant, e n . The output of the GX1 module 330 is delivered 
to the MAP module 360. 

[00042] The GX2 module 340 is also a complex multiplier used to perform 
the complex multiplication of its received input R e (ri) by two fixed complex valued 

constants, e 32 and e 32 . Mathematically, the function of this module may be 
described by the following products: 




output from GX module 



x output from GX module 



The GX2 module 340 receives the output R t (n) of the GX module 320 and is 

also adapted to receive the fixed constant e 32 from the corresponding storage 
unit 346. 

Those skilled in the art will appreciate that e 32 is the same as e 32 with the real 
and imaginary components reversed. Therefore, by multiplying R e (n) with e 32 

jl2it 

the product of R e (n) by e 32 may be obtained by manipulating the result of the 
first product thereby eliminating the need to perform an extra multiplication. 

Specifically, the second product R e (n) x e 32 may be obtained simply by 
rearranging the manner in which the inner products resulting from the first 

product i.e. R e (n) x e 32 are added or subtracted. The results of these two 
products are then delivered to the MAP module 360. Once the products of the 
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three complex multiplications performed by the GX module 320, the GX1 module 
330 and the GX2 module 340 are generated, they are sent to the MAP module 

360 where the product of of R e (n) by e 32 for any value of p (multiple of 4) and 
£ (integer) can be predicted. 

[00043] The mathematical function performed by the MAP module 360 is to 

m 

compute the sixteen component values, R t (n) ■ e 32 with £ ranging from 0 to 1 5, for 
each required output time sample. The MAP module 360 in FIG. 3 is adapted to 
receive four inputs corresponding to the complex products computed by the 
complex multiplier modules 320, 330 and 340. In the embodiment of FIG. 3, the 
MAP module 360 has sixteen outputs corresponding to sixteen distinct output 
samples. 

[00044] Each input port of the MAP module 360 receives a unique complex 

jAn fix j\2n 

number, R e (n) , multiplied by a certain constant (1, e 32 ,e 32 or e 32 ). For each 
value of ^with Pranging from 0 to 15, sixteen component products defined by 

R t (n) ■ e 32 and corresponding to different output time samples (granularity of 4) 
need to be computed and delivered to the output ports. However, out of the 
sixteen component products which need computing, four have already been 
computed. These are simply the four complex-valued inputs to the MAP module 
360. From these four inputs, any of the required sixteen component products 
may be generated by a simple multiplication of one of the four inputs by +1,-1, +j 
or -j. FIG. 4 depicts an example of the logic flow 400 which may be undertaken 
in the MAP module 360 to arrive at one of sixteen component products. As 
shown, a MUX stage 420 receives the 4 inputs from modules GX, GX1 and GX2. 
Depending on the output port (/?) of the MAP module 360 being considered and 
the value of £ , a complex product from one of the four input ports is selected and 
forwarded to a Decision stage 440 where an appropriate multiplication factor is 
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applied. This process is implemented for each output port of the MAP module 
360. The sixteen outputs 370 of the MAP module are subsequently delivered to 
the accumulation module 380 whose functionality is described below. 

[00045] The accumulation module 380 receives sixteen inputs from the 
MAP module 360. For each given input port, sixteen incoming complex numbers 
(these are the component values corresponding to values of equation (18) for £ 
ranging from 0 to 15) arrive every clock cycle to be summed together in a register 
in order to generate one single output sample. This process occurs for each of 
sixteen input ports resulting in sixteen output time samples being computed in 
parallel. After the sixteen component values are summed for each input, the 
registers are cleared and the process is repeated for computation of the next set 
of sixteen output time samples. A general depiction of the operation performed 
by the accumulation module 380 is shown in FIG. 5. 

[00046] According to the embodiment in FIG. 3, using three (3) complex 
multipliers allows for the generation of 16 output samples every 16 clock cycles. 
Therefore, the total run time for applying the IFFT/FFT operation on the 64 
complex-valued input samples would be 64 clock cycles. With regard to the 
brute force implementation depicted in FIG. 1 , at the expense of adding two 
complex multipliers, the total number of clock cycles required to compute the 64- 

point FFT/IFFT has been reduced by an order of magnitude from (64) 2 to 64. 

[00047] Due to the similarity between the forward and inverse FFT (the 
IFFT differs from the FFT only by the sign of the exponent), the same module or 
circuitry with trivial modifications can be used for both modulation and 
demodulation in an OFDM transceiver. Although not shown, it should also be 
noted that depending on if the FFT or IFFT is to be computed, the accumulation 
module 380 treats its addition results differently. If the IFFT operation is 
required, the final result of the addition for each output sample is divided by the 
total number of samples (i.e. N). In the embodiment of FIG. 3 and assuming the 
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IFFT operation is desired, for example, the results of each accumulation would 
be divided by 64. If the FFT operation is desired, there is no division. 

[00048] The advantage of an FFT/IFFT engine implemented with ASIC 
technology is that each of the functional blocks of the FFT/IFFT engine be 
mapped onto dedicated, parallel hardware resources thereby avoiding the 
difficult programming and optimization challenges of scheduling time-critical 
operations through a single DSP core. 

[00049] It should be noted that the LUTs and other modules which provide 
multiplicands to the complex multiplier modules can be termed as multiplicand 
generators as they provide multiplicands for the system. 

[00050] While preferred embodiments of the invention have been described 
and illustrated, it will be apparent to one skilled in the art that numerous 
modifications, variations and adaptations may be made without departing from 
the scope of the invention as defined in the claims appended hereto. 

[00051] Although various exemplary embodiments of the invention have 
been disclosed, it should be apparent to those skilled in the art that various 
changes and modifications can be made which will achieve some of the 
advantages of the invention without departing from the true scope of the 
invention. 

[00052] A person understanding this invention may now conceive of 
alternative structures and embodiments or variations of the above all of which are 
intended to fall within the scope of the invention as defined in the claims that 
follow. 
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